1,592 research outputs found

    Calculating the random guess scores of multiple-response and matching test items

    Get PDF
    For achievement tests, the guess score is often used as a baseline for the lowest possible grade for score to grade transformations and setting the cut scores. For test item types such as multiple-response, matching and drag-and-drop, determin-ing the guess score requires more elaborate calculations than the more straight-forward calculation of the guess score for True-False and multiple-choice test item formats. For various variants of multiple-response and matching types with respect to dichotomous and polytomous scoring, methods for determining the guess score are presented and illustrated with practical applications. The implica-tions for theory and practice are discussed

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Full text link
    This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

    Novel image analysis approach for quantifying expression of nuclear proteins assessed by immunohistochemistry: application to measurement of oestrogen and progesterone receptor levels in breast cancer

    Get PDF
    INTRODUCTION: Manual interpretation of immunohistochemistry (IHC) is a subjective, time-consuming and variable process, with an inherent intra-observer and inter-observer variability. Automated image analysis approaches offer the possibility of developing rapid, uniform indicators of IHC staining. In the present article we describe the development of a novel approach for automatically quantifying oestrogen receptor (ER) and progesterone receptor (PR) protein expression assessed by IHC in primary breast cancer. METHODS: Two cohorts of breast cancer patients (n = 743) were used in the study. Digital images of breast cancer tissue microarrays were captured using the Aperio ScanScope XT slide scanner (Aperio Technologies, Vista, CA, USA). Image analysis algorithms were developed using MatLab 7 (MathWorks, Apple Hill Drive, MA, USA). A fully automated nuclear algorithm was developed to discriminate tumour from normal tissue and to quantify ER and PR expression in both cohorts. Random forest clustering was employed to identify optimum thresholds for survival analysis. RESULTS: The accuracy of the nuclear algorithm was initially confirmed by a histopathologist, who validated the output in 18 representative images. In these 18 samples, an excellent correlation was evident between the results obtained by manual and automated analysis (Spearman\u27s rho = 0.9, P \u3c 0.001). Optimum thresholds for survival analysis were identified using random forest clustering. This revealed 7% positive tumour cells as the optimum threshold for the ER and 5% positive tumour cells for the PR. Moreover, a 7% cutoff level for the ER predicted a better response to tamoxifen than the currently used 10% threshold. Finally, linear regression was employed to demonstrate a more homogeneous pattern of expression for the ER (R = 0.860) than for the PR (R = 0.681). CONCLUSIONS: In summary, we present data on the automated quantification of the ER and the PR in 743 primary breast tumours using a novel unsupervised image analysis algorithm. This novel approach provides a useful tool for the quantification of biomarkers on tissue specimens, as well as for objective identification of appropriate cutoff thresholds for biomarker positivity. It also offers the potential to identify proteins with a homogeneous pattern of expression

    GPAQ-R: development and psychometric properties of a version of the general practice assessment questionnaire for use for revalidation by general practitioners in the UK.

    Get PDF
    BACKGROUND: The General Practice Assessment Questionnaire (GPAQ) has been widely used to assess patient experience in general practice in the UK since 2004. In 2013, new regulations were introduced by the General Medical Council (GMC) requiring UK doctors to undertake periodic revalidation, which includes assessment of patient experience for individual doctors. We describe the development of a new version of GPAQ - GPAQ-R which addresses the GMC's requirements for revalidation as well as additional NHS requirements for surveys that GPs may need to carry out in their own practices. METHODS: Questionnaires were given out by doctors or practice staff after routine consultations in line with the guidance given by the General Medical Council for surveys to be used for revalidation. Data analysis and practice reports were provided independently. RESULTS: Data were analysed for questionnaires from 7258 patients relating to 164 GPs in 29 general practices. Levels of missing data were generally low (typically 4.5-6%). The number of returned questionnaires required to achieve reliability of 0.7 were around 35 for individual doctor communication items and 29 for a composite score based on doctor communication items. This suggests that the responses to GPAQ-R had similar reliability to the GMC's own questionnaire and we recommend 30 completed GPAQ-R questionnaires are sufficient for revalidation purposes. However, where an initial screen raises concern, the survey might be repeated with 50 completed questionnaires in order to increase reliability. CONCLUSIONS: GPAQ-R is a development of a well-established patient experience questionnaire used in general practice in the UK since 2004. This new version can be recommended for use in order to meet the UK General Medical Council's requirements for surveys to be used in revalidation of doctors. It also meets the needs of GPs to ask about patient experience relating to aspects of practice care that are not specific to individual general practitioners (e.g. receptionists, telephone access) which meet other survey requirements of the National Health Service in England. Use of GPAQ-R has the potential to reduce the number of surveys that GPs need to carry out in their practices to meet the various regulatory requirements which they face

    Visual recognition of gestures in a meeting to detect when documents being talked about are missing

    Get PDF
    Meetings frequently involve discussion of documents and can be significantly affected if a document is absent. An agent system capable of spontaneously retrieving a document at the point it is needed would have to judge whether a meeting is talking about a particular document and whether that document is already present. We report the exploratory application of agent techniques for making these two judgements. To obtain examples from which an agent system can learn, we first conducted a study of participants making these judgements with video recordings of meetings. We then show that interactions between hands and paper documents in meetings can be used to recognise when a document being talked about is not to hand. The work demonstrates the potential for multimodal agent systems using these techniques to learn to perform specific, discourse-level tasks during meetings

    Breast cancer risk reduction:is it feasible to initiate a randomised controlled trial of a lifestyle intervention programme (ActWell) within a national breast screening programme?

    Get PDF
    BackgroundBreast cancer is the most commonly diagnosed cancer and the second cause of cancer deaths amongst women in the UK. The incidence of the disease is increasing and is highest in women from least deprived areas. It is estimated that around 42% of the disease in post-menopausal women could be prevented by increased physical activity and reductions in alcohol intake and body fatness. Breast cancer control endeavours focus on national screening programmes but these do not include communications or interventions for risk reductionThis study aimed to assess the feasibility of delivery, indicative effects and acceptability of a lifestyle intervention programme initiated within the NHS Scottish Breast Screening Programme (NHSSBSP).MethodsA 1:1 randomised controlled trial (RCT) of the 3 month ActWell programme (focussing on body weight, physical activity and alcohol) versus usual care conducted in two NHSSBSP sites between June 2013 and January 2014. Feasibility assessments included recruitment, retention, and fidelity to protocol. Indicative outcomes were measured at baseline and 3 month follow-up (body weight, waist circumference, eating and alcohol habits and physical activity. At study end, a questionnaire assessed participant satisfaction and qualitative interviews elicited women¿s, coaches and radiographers¿ experiences. Statistical analysis used Chi squared tests for comparisons in proportions and paired t tests for comparisons of means. Linear regression analyses were performed, adjusted for baseline values, with group allocation as a fixed effectResultsA pre-set recruitment target of 80 women was achieved within 12 weeks and 65 (81%) participants (29 intervention, 36 control) completed 3 month assessments. Mean age was 58¿±¿5.6 years, mean BMI was 29.2¿±¿7.0 kg/m2 and many (44%) reported a family history of breast cancer.The primary analysis (baseline body weight adjusted) showed a significant between group difference favouring the intervention group of 2.04 kg (95%CI ¿3.24 kg to ¿0.85 kg). Significant, favourable between group differences were also detected for BMI, waist circumference, physical activity and sitting time. Women rated the programme highly and 70% said they would recommend it to others.ConclusionsRecruitment, retention, indicative results and participant acceptability support the development of a definitive RCT to measure long term effects.Trial registrationThe trial was registered with Current Controlled Trials (ISRCTN56223933)

    On the scent of sexual attraction

    Get PDF
    A study in the current issue of BMC Biology has identified a mouse major urinary protein as a pheromone that attracts female mice to male urine marks and induces a learned attraction to the volatile urinary odor of the producer. See research article http://www.biomedcentral.com/1741-7007/8/7

    A hypothetico-deductive approach to assessing the social function of chemical signalling in a non-territorial solitary carnivore

    Get PDF
    The function of chemical signalling in non-territorial solitary carnivores is still relatively unclear. Studies on territorial solitary and social carnivores have highlighted odour capability and utility, however the social function of chemical signalling in wild carnivore populations operating dominance hierarchy social systems has received little attention. We monitored scent marking and investigatory behaviour of wild brown bears Ursus arctos, to test multiple hypotheses relating to the social function of chemical signalling. Camera traps were stationed facing bear ‘marking trees’ to document behaviour by different age sex classes in different seasons. We found evidence to support the hypothesis that adult males utilise chemical signalling to communicate dominance to other males throughout the non-denning period. Adult females did not appear to utilise marking trees to advertise oestrous state during the breeding season. The function of marking by subadult bears is somewhat unclear, but may be related to the behaviour of adult males. Subadults investigated trees more often than they scent marked during the breeding season, which could be a result of an increased risk from adult males. Females with young showed an increase in marking and investigation of trees outside of the breeding season. We propose the hypothesis that females engage their dependent young with marking trees from a young age, at a relatively ‘safe’ time of year. Memory, experience, and learning at a young age, may all contribute towards odour capabilities in adult bears

    Profiling quality of care: Is there a role for peer review?

    Get PDF
    BACKGROUND: We sought to develop a more reliable structured implicit chart review instrument for use in assessing the quality of care for chronic disease and to examine if ratings are more reliable for conditions in which the evidence base for practice is more developed. METHODS: We conducted a reliability study in a cohort with patient records including both outpatient and inpatient care as the objects of measurement. We developed a structured implicit review instrument to assess the quality of care over one year of treatment. 12 reviewers conducted a total of 496 reviews of 70 patient records selected from 26 VA clinical sites in two regions of the country. Each patient had between one and four conditions specified as having a highly developed evidence base (diabetes and hypertension) or a less developed evidence base (chronic obstructive pulmonary disease or a collection of acute conditions). Multilevel analysis that accounts for the nested and cross-classified structure of the data was used to estimate the signal and noise components of the measurement of quality and the reliability of implicit review. RESULTS: For COPD and a collection of acute conditions the reliability of a single physician review was quite low (intra-class correlation = 0.16–0.26) but comparable to most previously published estimates for the use of this method in inpatient settings. However, for diabetes and hypertension the reliability is significantly higher at 0.46. The higher reliability is a result of the reviewers collectively being able to distinguish more differences in the quality of care between patients (p < 0.007) and not due to less random noise or individual reviewer bias in the measurement. For these conditions the level of true quality (i.e. the rating of quality of care that would result from the full population of physician reviewers reviewing a record) varied from poor to good across patients. CONCLUSIONS: For conditions with a well-developed quality of care evidence base, such as hypertension and diabetes, a single structured implicit review to assess the quality of care over a period of time is moderately reliable. This method could be a reasonable complement or alternative to explicit indicator approaches for assessing and comparing quality of care. Structured implicit review, like explicit quality measures, must be used more cautiously for illnesses for which the evidence base is less well developed, such as COPD and acute, short-course illnesses
    corecore